Managing Retries
Learn how to manage the retries of an API call in different situations.
Introduction#
Digital applications receive millions of requests every second, but they might not complete their request-response cycle due to many possible failures. In this case, the client can generally receive the following HTTP status codes:
HTTP Status Code | Reason |
408 | Request timeout |
429 | Too many requests |
500 | Internal server error |
502 | Gateway or proxy server receives unexpected response |
503 | Service temporarily unavailable |
504 | Gateway or proxy server timeout |
Although each status code has its own story, we organized them into the following four categories. In the first two categories, the client doesn’t get a response from the server and needs to retry. In the last two categories, we receive the response with an error and need to retry after some modifications:
Request lost: The first scenario is when a request is initiated by the client but never received by the target service. It’s as if the request never occurred.
In the scenario above, the client doesn’t get a response. For example, requests can get lost due to congestion at some router while en route to the service.
Response lost: The second scenario is where the client initiates the request, which is processed by the target service. After this, the service sends the response to the requested client, which never arrives.
Service unresponsive: The third scenario is when the target service can’t process the request appropriately, possibly because of an error on the downstream service. The error status and the appropriate message are sent back to the client.
Response with error code: The fourth scenario can be where the client receives a response with an error from the target service due to a bad request or parameters. In this case, the client must address this error before retrying.
In the first two scenarios discussed above, the clients don’t know which case occurred because no response has been received. In the third scenario, the response to the server error is sent back to the client in case it can't process the request. In contrast, the client application shows an error indicating that the connection to the service has failed. The only option in the first three scenarios that the client has is to retransmit the same request. This process is called a retry. The last scenario requires clients to address the errors in the request before retrying.
The following section will explain the side effects of the retries. Also, we’ll emphasize what issues arise when the client resends the same request without fixing the errors received from the server.
How do retries lead to issues?#
This section primarily highlights the issues that arise because of retries. Although the rate limiter handles frequent requests (new or retry requests) in a specified time, the issues mostly arise before reaching the rate limit threshold. Let's go through the issues below:
In the first of the three scenarios mentioned above, requests either reach the server or do not reach it at all. Even if requests reach the server, clients either don’t get a response or receive an error response due to the server. The natural reaction from the client would be to retry, which can lead to unnecessary utilization of computational resources at the service.
Another scenario is where numerous clients perform retries simultaneously. As a result, the network might be congested when many requests try to reach the target service concurrently.
An important case is when the client sends useless retry requests. These kinds of retries are useless because the response from the server won’t change. For instance, a client sending a bad request will always lead to a failed response. These scenarios require the client to stop sending further retry requests because the response from the server will remain the same. Although it won’t do anything with data, it will waste the bandwidth, overburden the server, and can cause network congestion.
Sending an excessive number of retries can hurt the client as well. For free APIs, doing this might cause the client's requests to throttle the machinery, or for paid services, a client might be wasting their API call quota.
The following section will explain how retries work and solve the discussed issues.
How do we manage retries?#
In the section above, we have discussed scenarios explaining when clients need to retry and when too many retries create a mess. This section will discuss solutions to prevent servers from being overburdened and avoid network congestion. We’ll also look at some techniques for discouraging retries where they aren’t needed.
The backoff algorithm #
One solution is adding a jitter in each client's wait time so that the client does not frequently face the same network failure. We can obtain the jitter value using the exponential backoff algorithm. This algorithm uses a multiplication element to reduce the retry process rate. In other words, the interval will increase with each request. Let's take a look at the formula that’s used to reduce the process rate:
Here,
The backoff algorithm with random jitter#
We can address the problem above by adding a random jitter instead of adding the same jitter (calculated using the backoff algorithm) in the waiting time of each client's retry. This random jitter is computed separately and added to the value calculated from the backoff algorithm. By doing this, we can reduce the burden on the server. Let's take a look at the following illustration after adding random jitter:
We illustrated above how the combination of the backoff algorithm and random jitter provides the retry waiting time. In the illustration above, t0 indicates the time when clients make requests for the first time. The t1 specifies the previous retry time of the specified client. For instance, the t1 for Client 1 is t0+x0. Additionally, the clients do not need to retry repeatedly, as shown in the illustration above.
Capping the maximum number of retries in a certain period can be one of the ways to limit the number of retries. The retries from various clients might be processed the first time they reach the server. Another way to do this can be to process a few requests the first time and the rest in the next retry.
Let's discuss this example where the server processes retries from various clients in two cycles. This is given that the retries are happening due to transient errors, which are recoverable temporary errors, such as a momentarily unavailable connection, timeouts, and so on. We’ll also see how the number of retries reduces with this approach.
1 of 3
2 of 3
3 of 3
In the slides above, the server first processes the request Client 3 sends on its first retry. On the second retry, the server processes the remaining retried requests. As a result, breaking the synchrony of client requests helps manage the instantaneous load on the server and, on average, helps the clients by reducing retry-induced delays.
Note: Sometimes, the server can also specify the waiting time to the client for the next retry. This information is sent in the
Retry-Afterheader to the client. Although it isn’t practical for all discussed HTTP status codes, it is suitable for a few, such as the429and503status codes.
Point to Ponder
Question
Why are we adding latency through jitter if it slows down the client?
The purpose of random jitter is to break the synchrony between different clients’ requests or stop a client from sending too many requests too soon (because if a service isn’t working, it might need some time to recover). It adds latency for specific calls, but on average, such a mechanism spreads out requests and provides lower average latency.
Useless retries prevention#
The next issue is useless retries that the HTTP status codes can address.
If a client receives a status code such as 4xx or 5xx (other than what is mentioned in the table at the start), the client should not retry. Before retrying, the client should either fix any issues that can be fixed in the request or retry after sufficient time. This is to give the service some time to recover from its internal error. This is because these requests won’t change responses, but can overload the target services and cause network congestion.
Let's discuss one example of both client-side errors and server-side errors. We’ll start with a client-side error, where the client requests a particular resource, but the requested resource is no longer available. In this case, the client gets the 410 status code and doesn’t need to retry. It’s because the retry request will get the same response from the server.
Let's take a look at an example of server-side errors where the client sends a request that does not satisfy the extension policy (resource access policy), and the server responds with a 510 status code. This response includes details such as the server not supporting extensions that are asked for in the request. If the client retries the same request, it will get the same response each time until the request is altered according to the details in the 510 response. Some familiar HTTP status codes where retrying is not useful are listed in the table below:
HTTP Status Code (4xx) | Reason | HTTP Status Code (5xx) | Reason |
400 | Invalid request means it does not follow the rules | 501 | Server does not support the functionality required by the request |
401 | Invalid authentication credentials | 505 | HTTP version not supported |
403 | Refuse to authorize the request | 507 | Insufficient storage |
404 | Requested page does not exist | 508 | Infinite loop detected |
Note: The list of status codes above is not exhaustive. The general guideline is that the error codes with 4xx are those where the client needs to fix something, while 5xx errors are those where the service needs to act to resolve the issue.
Summary #
Many things can go wrong in the request-response cycle of an API. Due to the nature of a distributed system, clients might not always be able to find the root cause of such issues. Often, a request retry is considered a remedy for all these problems. However, the improper use of retries can hurt the clients, the network, or the service in different ways. We discussed some techniques (request cap and backoff) to utilize the retry facility efficiently. Idempotency is a way to tackle the side effects of retries (such as potential duplication of work and data) that we’ll see in a different lesson.
Circuit Breaker Pattern
Caching at Different Layers